library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(TTR)
library(ggplot2)
library(tseries)
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
covid <- read.csv("D:/emotions/archive (1)/covid_19_indonesia_time_series_all.csv")
rmarkdown::paged_table(covid)
The data has 4,578 obs and 37 columns.
#Data Preparation and EDA
#subset the location only Jakarta, and column i..Date, and New.Cases and change data type
covid1 <- covid %>%
subset(Location == "DKI Jakarta") %>%
select(ï..Date, New.Cases) %>%
mutate(ï..Date = mdy(ï..Date))
#change the column name from i..Date to Date
colnames(covid1)[1] = "Date"
#After subset the data can’t contain NA (missing value)
#Check if there is missing column in data
colSums(is.na(covid1))
## Date New.Cases
## 0 0
The data in Maret is so much of 0 value, so we’ll use the data start from April..The data must be ordered by time.
covid1 <- covid1 %>%
subset(Date >= "2020-04-01")
Then we’ll check when the data start and end.
range(covid1$Date)
## [1] "2020-04-01" "2021-11-05"
Now, our data is from April 1, 2020 to July 31, 2020. Then we’ll convert the data to ts object.
#create object ts
covid_ts <- ts(data = covid1$New.Cases,
start = min(covid1$Date),
frequency = 7) #weekly seasonality
#visualise object covid_ts
covid_ts %>% autoplot()
From the plot, we can see that our data’s model is multiplicative. We’ll try to see the seasonality, trend and error of data.
covid_dc <- decompose(covid_ts, type = "multiplicative")
covid_dc %>% autoplot()
We can also see an up or down trend using adjusted seasonal which has removing the effects of seasonal data.
autoplot(covid_dc$x - covid_dc$seasonal)
From the plot and information above, we can see that the data’s seasonal pattern is random or no seasonal. The trend patterns of data is increasing.